Understanding Customer Problem Troubleshooting from Storage System Logs
نویسندگان
چکیده
Customer problem troubleshooting has been a critically important issue for both customers and system providers. This paper makes two major contributions to better understand this topic. First, it provides one of the first characteristic studies of customer problem troubleshooting using a large set (636,108) of real world customer cases reported from 100,000 commercially deployed storage systems in the last two years. We study the characteristics of customer problem troubleshooting from various dimensions as well as correlation among them. Our results show that while some failures are either benign, or resolved automatically, many others can take hours or days of manual diagnosis to fix. For modern storage systems, hardware failures and misconfigurations dominate customer cases, but software failures take longer time to resolve. Interestingly, a relatively significant percentage of cases are because customers lack sufficient knowledge about the system. We observe that customer problems with attached system logs are invariably resolved much faster than those without logs. Second, we evaluate the potential of using storage system logs to resolve these problems. Our analysis shows that a failure message alone is a poor indicator of root cause, and that combining failure messages with multiple log events can improve low-level root cause prediction by a factor of three. We then discuss the challenges in log analysis and possible solutions.
منابع مشابه
CLUEBOX: A Performance Log Analyzer for Automated Troubleshooting
Performance problems in complex systems are often caused by underprovisioning, workload interference, incorrect expectations or bugs. Troubleshooting such systems is a difficult task faced by service engineers. We have built CLUEBOX, a non-intrusive toolkit that aids rapid problem diagnosis. It employs machine learning techniques on the available performance logs to characterize workloads, pred...
متن کاملDevelopment of an Intelligent System to Synthesize Petrophysical Well Logs
Porosity is one of the fundamental petrophysical properties that should be evaluated for hydrocarbon bearing reservoirs. It is a vital factor in precise understanding of reservoir quality in a hydrocarbon field. Log data are exceedingly crucial information in petroleum industries, for many of hydrocarbon parameters are obtained by virtue of petrophysical data. There are three main petrophysical...
متن کاملCANASTA: The Crash Analysis Troubleshooting Assistant
CANASTA (crash analysis troubleshooting assistant) is a Digital proprietary knowledge-based system developed by the Artificial Intelligence Applications Group (AIAG) at Digital Equipment Corporation in collaboration with Digital's customer support centers (CSCs). It is targeted to assist computer support engineers at CSCs in analyzing operating system crashes, traditionally one of the most comp...
متن کاملTroubleshooting at the Call Centre: A Knowledge-based Approach
The key focus of the customer call-centre is effective and efficient resolution of customer problems. Keeping staff and clients happy by streamlining call-centre workflow is integral in achieving this end. We propose to extend a knowledge representation and acquisition technique, known as multiple classification ripple down rules (MCRDR), that will support management of troubleshooting knowledg...
متن کامل- lem troubleshooting and system logs
Arkady Kanevsky is a senior research engineer at NetApp Advanced Technology Group. Arkady has done extensive research on RDMA technology, storage resiliency, scalable storage systems, and parallel and distributed computing. He received a Ph.D. in computer science from the University of Illinois in 1987. He was a faculty member at Dartmouth College and Texas A&M University prior to joining the i...
متن کامل